Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tags: Adds [Excluded:WindowsDocker] tag #5355

Merged

Conversation

claudiubelu
Copy link
Contributor

@claudiubelu claudiubelu commented Dec 8, 2020

There are some tests that are currently tagged as [LinuxOnly], but they can run and pass on Windows containerd nodes, as it has more features than Windows docker nodes. Ideally, we would run those tests on the Windows containerd runs.

This PR proposes a different tag, [Excluded:WindowsDocker], which can be used to tag those tests than can pass on Windows containerd and not on Windows docker, so we can use it as a regex to filter those specific tests.

This would be a temporary tag, since dockershim is getting removed in Kubernetes 1.24. After that, this tag will be removed as well.

Which issue(s) this PR fixes:

Fixes #

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/developer-guide Issues or PRs related to the developer guide sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Dec 8, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 8, 2021
@BenTheElder
Copy link
Member

/cc @spiffxp
/retitle WIP: tags: Adds [Excluded:WindowsDocker] tag

@k8s-ci-robot k8s-ci-robot changed the title WIP: tags: Adds [Excluded:WindowsDcoker] tag WIP: tags: Adds [Excluded:WindowsDocker] tag Mar 8, 2021
@claudiubelu claudiubelu changed the title WIP: tags: Adds [Excluded:WindowsDocker] tag tags: Adds [Excluded:WindowsDocker] tag Mar 8, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 8, 2021
@@ -450,6 +450,11 @@ to be eligible for this tag. This tag does not supersed any other labels.
(e.g.: seLinuxOptions) or is unable to run on Windows nodes, it is labeled
`[LinuxOnly]`. When using Windows nodes, this tag should be added to the
`skip` argument.
- `[Exclude:WindowsDocker]`: Windows Kubelet supports both Docker and Containerd,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feature:NotSupportedByWindowsDocker would mean we don't have to change our existing tag regexes I think? I might be wrong

I think this needs some changes with an eye toward how we use tags with conformance. Maybe just docs, maybe some of the verify code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are quite a lot of test jobs that have \[Feature:.+\] as their skip regex, so that would mean we'd suddenly exclude these tests from all jobs, even though we shouldn't. It would be harder to update all those regexes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is something of a general downside of [Feature] and the whole regex tags thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I like [Excluded:WindowsDocker] as a bikeshed (As in the linked kubernetes PR) because it implies that it will automatically be excluded somehow (which is also not the sort of logic conformance tests could even have and some of these are conformance tests).

Docker being deprecated, I think it's honestly reasonable to just maintain a list of excluded tests in the windows docker jobs until docker is removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, dockershim is going to be removed in Kubernetes 1.24, so there's still some time until then. This tag would only be temporary until then, and it would be removed afterwards anyways.

Hm, I'd say the purpose of this tag is to extend the test suite we're running on Windows, rather than the other way around. Right now, those tests are being excluded for all Windows runs, by simply having the [LinuxOnly] tag. There are a couple of tests that can easily run and pass on Windows containerd since it has more features, which is why this tag is proposed, so we can run those tests on Windows containerd, while not including them in Windows docker runs.

Indeed, The alternative would be to remove the [LinuxOnly] tag, and add to the skip regex a very long string that would represent that list of tests. But it would be simpler to just exclude [Excluded:WindowsDocker].

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also some tests that sniff the runtime and auto skip, though not conformance.

I still bikeshed the name at least, the current one sound automatic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's bikesheding, then what about:

  • [SkipWindowsDocker]
  • [PartialWindowsSupport]

Or, since it's related to additional containerd features:

  • [Containerd]
  • [ContainerdOnly] (not a fan of this, it would imply that it wouldn't work on Linux docker, which might not be true)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should start tagging with [Containerd], docker on linux aside there's also cri-o or future runtimes ...

The first two sound ok @spiffxp ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like you're saying there are things [Windows] nodes cannot do with [Docker] but can do with [Containerd]. Is the situation the same for [Linux] nodes?

Trying to squish:

  • Node os capabilities
  • CRI capabilities
  • Node os implementation
  • CRI implementation

All into a single string tag (or taxonomy) is really showing this system's limits, again.

Thinking through my own bikeshed on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of any other features at the moment, only one comes to mind: single file mounts / mappings. That works in Linux using Docker, but it has been known to not work on Windows using Docker (https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md#windows--linux-considerations , see Storage section). However, that scenario now works with Windows using Containerd (actually, we should update the docs for this as well). We had a couple of PRs that enabled single file mappings for Windows Containerd (example: kubernetes/kubernetes#83057 ). I have sent a PR that would enable some Conformance tests for Windows Containerd as well: kubernetes/kubernetes#97045

It's not a very common scenario, and it's mostly a requirement for Conformance tests since we can't have any Skip logic in them. It might happen more if more tests will be promoted to Conformance.

@spiffxp
Copy link
Member

spiffxp commented Mar 8, 2021

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 8, 2021
@spiffxp
Copy link
Member

spiffxp commented Jul 28, 2021

Can you help us understand the scope of the change here by listing out the tests you would propose changing to follow this pollicy? I'm also trying to understand what this gains Windows users and vendors.

I assume this isn't proposing dropping [LinuxOnly] from all tests, as my impression is there are some things that Windows cannot fundamentally do. So then it sounds like we're tagging some tests as known to (not) work with a very specific (Node OS, CRI) configuration. That sets precedent for an explosion of combos along multiple dimension, which I don't like.

We really need to have a broader conversation about how we're going to detect and select capabilities for a variety of cluster configurations, not just along the dimensions of Node OS and CRI... think CSI, CNI, and Cloud Provider too.

Having said all that, we have a [DisabledForLargeClusters] tag. I don't like it for the same reasons (now we've added cluster size as another dimension), but I could maybe see my way to [DisabledForWindowsDocker] if that's the last one. But I need to see the scope of what we're talking about here. And that still feels like a really slippery slope.

@SergeyKanzhelev
Copy link
Member

btw, if we will proceed with the plan of removing the dockershim in 1.24, this tag will not be needed. So we are talking about 1 release (unless plans will change). Not sure if bike shedding is important here. It is definitely not ideal to run tests marked [LinuxOnly] on windows. Perhaps an alternative will be to create a "proxy" test that just execute these tests, but not add it as Conformance for now. So it will not be needed to extend the Conformance execution logic to distinguish the runtime.

@claudiubelu
Copy link
Contributor Author

Can you help us understand the scope of the change here by listing out the tests you would propose changing to follow this pollicy? I'm also trying to understand what this gains Windows users and vendors.

Sure. The tests that are currently targeted are:

[k8s.io] Container Runtime blackbox test on terminated container should report termination message [LinuxOnly] as empty when pod succeeds and TerminationMessagePolicy FallbackToLogsOnError is set [NodeConformance] [Conformance]
[k8s.io] Container Runtime blackbox test on terminated container should report termination message [LinuxOnly] from file when pod succeeds and TerminationMessagePolicy FallbackToLogsOnError is set [NodeConformance] [Conformance]
[k8s.io] Container Runtime blackbox test on terminated container should report termination message [LinuxOnly] from log output if TerminationMessagePolicy FallbackToLogsOnError is set [NodeConformance] [Conformance]
[k8s.io] Container Runtime blackbox test on terminated container should report termination message [LinuxOnly] if TerminationMessagePath is set [NodeConformance]
[sig-storage] Subpath Atomic writer volumes should support subpaths with configmap pod [LinuxOnly] [Conformance]
[sig-storage] Subpath Atomic writer volumes should support subpaths with configmap pod with mountPath of existing file [LinuxOnly] [Conformance]
[sig-storage] Subpath Atomic writer volumes should support subpaths with downward pod [LinuxOnly] [Conformance]
[sig-storage] Subpath Atomic writer volumes should support subpaths with projected pod [LinuxOnly] [Conformance]
[sig-storage] Subpath Atomic writer volumes should support subpaths with secret pod [LinuxOnly] [Conformance]

These tests are targeted here: kubernetes/kubernetes#97045

There are a few tests that I haven't checked yet, that might still be applicable:

[k8s.io] Kubelet when scheduling a busybox Pod with hostAliases should write entries to /etc/hosts [LinuxOnly] [NodeConformance] [Conformance]
[k8s.io] KubeletManagedEtcHosts should test kubelet managed /etc/hosts file [LinuxOnly] [NodeConformance] [Conformance]
[sig-network] DNS should provide /etc/hosts entries for the cluster [LinuxOnly] [Conformance]
[sig-network] DNS should provide DNS for pods for Hostname [LinuxOnly] [Conformance]

Beyond that, there seems to be a lot of tests that seem to relate to single-file mappings (which is supported in Windows Containerd), which are not conformance:

[sig-storage] In-tree Volumes [Driver: azure-disk] [Testpattern: Dynamic PV (default fs)] subPath should support existing single file [LinuxOnly]
[sig-storage] In-tree Volumes [Driver: azure-disk] [Testpattern: Dynamic PV (default fs)] subPath should support file as subpath [LinuxOnly]
[sig-storage] In-tree Volumes [Driver: azure-disk] [Testpattern: Inline-volume (default fs)] subPath should support existing single file [LinuxOnly]
[sig-storage] In-tree Volumes [Driver: azure-disk] [Testpattern: Inline-volume (default fs)] subPath should support file as subpath [LinuxOnly]
[sig-storage] In-tree Volumes [Driver: azure-disk] [Testpattern: Pre-provisioned PV (default fs)] subPath should support existing single file [LinuxOnly]
[sig-storage] In-tree Volumes [Driver: azure-disk] [Testpattern: Pre-provisioned PV (default fs)] subPath should support file as subpath [LinuxOnly]
... The list is much longer, since they apply for multiple drivers.
... Plus, last time I checked, we also had oxymoronic tests, like:
[sig-storage] In-tree Volumes [Driver: local][LocalVolumeType: block] [Testpattern: Dynamic PV (ntfs)][sig-windows] subPath should support existing single file [LinuxOnly]
[sig-storage] In-tree Volumes [Driver: local][LocalVolumeType: block] [Testpattern: Dynamic PV (ntfs)][sig-windows] subPath should support file as subpath [LinuxOnly]

We've recently merged a bug fix regarding the Subpath Atomic writer volumes on Windows (they were supposed to work for Windows containerd pods), so we'd want to also cover these cases, so we'd have a guarantee that we won't have regressions there: kubernetes/kubernetes#97642

There are other Windows containerd-only features that could benefit from such a tag, like the newly introduced HostProcess containers: kubernetes/kubernetes#99576 . Also, HostProcess containers can have HostNetwork enabled, which could potentially enable other sets of tests (although, it is debatable if it's the same scenario / case).

I assume this isn't proposing dropping [LinuxOnly] from all tests, as my impression is there are some things that Windows cannot fundamentally do. So then it sounds like we're tagging some tests as known to (not) work with a very specific (Node OS, CRI) configuration. That sets precedent for an explosion of combos along multiple dimension, which I don't like.

Correct, some things are just incompatible between Linux and Windows, like SELinux, RunAsUser (although we have RunAsUsername), RunAsGroup, etc.

We really need to have a broader conversation about how we're going to detect and select capabilities for a variety of cluster configurations, not just along the dimensions of Node OS and CRI... think CSI, CNI, and Cloud Provider too.

Having said all that, we have a [DisabledForLargeClusters] tag. I don't like it for the same reasons (now we've added cluster size as another dimension), but I could maybe see my way to [DisabledForWindowsDocker] if that's the last one. But I need to see the scope of what we're talking about here. And that still feels like a really slippery slope.

[DisabledForWindowsDocker] sounds good to me. But in the end, it's just a temporary tag we'd want to use to easily improve our coverage. This tag would be dropped in 1.24 when dockershim will be gone. We can add a TODO for this too.

btw, if we will proceed with the plan of removing the dockershim in 1.24, this tag will not be needed. So we are talking about 1 release (unless plans will change). Not sure if bike shedding is important here. It is definitely not ideal to run tests marked [LinuxOnly] on windows. Perhaps an alternative will be to create a "proxy" test that just execute these tests, but not add it as Conformance for now. So it will not be needed to extend the Conformance execution logic to distinguish the runtime.

Correct. Hm, could you give an example of such a proxy test? I can't really imagine it right now. As a test, it should still have a name, which should be regex-selectable somehow (e.g.: label), and avoid running the same test twice for every other platform CI.

Co-authored-by: Aaron Crickenberger <spiffxp@google.com>
Copy link
Member

@spiffxp spiffxp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 7, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: claudiubelu, spiffxp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 7, 2021
@k8s-ci-robot k8s-ci-robot merged commit 7deec62 into kubernetes:master Oct 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/developer-guide Issues or PRs related to the developer guide cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants